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Abstract 

Flip a coin repeatedly, and stop whenever you want. Your payoff 
is the proportion of heads, and you wish to maximize this payoff in 
expectation. This so-called Chow-Robbins game is amenable to com- 
puter analysis, but while simple-minded number crunching can show 
that it is best to continue in a given position, establishing rigorously 
that stopping is optimal seems at first sight to require "backward in- 
duction from infinity" . 

We establish a simple upper bound on the expected payoff in a 
given position, allowing efficient and rigorous computer analysis of 
positions early in the game. In particular we confirm that with 5 
heads and 3 tails, stopping is optimal. 

1 The Chow-Robbins game 

The following game was introduced by Yuan-Shin Chow and Herbert Robbins 
PQ in 1964: We toss a coin repeatedly, and stop whenever we want. Our payoff 
is the proportion of heads up to that point, and we assume that we want to 
maximize the expected payoff. 

Basic properties of this game, like the fact that there is an optimal strat- 
egy that stops with probability 1, were established in pp. Precise asymp- 
totical results were obtained by Aryeh Dvoretzky [2] and Larry Shepp [I]. 
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In particular Shepp showed that for the optimal strategy, the proportion of 
heads required for stopping after n coin tosses is asymptotically 

1 0.41996... 
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where the constant is the root of a certain integral equation. But as was 
pointed out more recently by Luis Medina and Doron Zeilberger j3], for a 
number of positions early in the game the optimal decisions were still not 
known rigorously. 

Let V(a, n) be the expected payoff under optimal play from position 
(a, n), by which we mean a heads out of n coin flips. The game is suitable for 
computer analysis, but there is a fundamental problem in that it seems one 
has to do "backward induction from infinity" in order to determine V(a,n). 
Clearly 

V(a, n) = max g, ^fr" + P + + L" + P) _ (1) 
but the "base case" is at infinity. 



2 Lower bound on V(a, n) 

In position (a, n) we can guarantee payoff a/n by stopping. Moreover, if 
a/n < 1/2, then by the recurrence of simple random walk on Z, we can wait 
until the proportion of heads is at least 1/2. Therefore 

V(a, n) > max 

We can recursively establish better lower bounds by starting from the in- 
equality (jSJ) at a given "horizon" , and then working our way backwards using 
([1]). An obvious approach is letting the horizon consist of all positions with 
n = N for some fixed N. In practice it is more efficient to use ([1]) only for 
positions where in addition a « n/2, say when \a — n/2\ < cy/~N for some 
suitable constant c, and to resort to (J2J) outside that range. This allows a 
greater value of iV at given computational resources. 

If in this way we find that V(a,n) > a/n, then in position (a, n), contin- 
uing is better than stopping. For instance it is straightforward to check (see 
the discussion in [3]) that V(2,3) > 2/3, from which it follows that with 2 
heads versus 1 tails, we should continue. 
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The third column of Table [T] (in the Appendix) shows positions for which 
we have determined that continuing is better than stopping. These results are 
based on a calculation with a horizon stretching out to n = 10 7 . They agree 
with [31 Section 5] with one exception: Medina and Zeilberger conjecture 
based on calculations with a horizon of 50000 that, in the notation of [21 [3l S] , 
Pm — 9, meaning that the difference (number of heads minus number of 
tails) required in order to stop after 127 flips is 9. Accordingly they suggest 
stopping with 68-59, but our computation shows that continuing is slightly 
better. 

On the other hand, in order to conclude that stopping is ever optimal, 
we need a nontrivial upper bound on V(a,n). Clearly such an upper bound 
cannot come from (pQ) alone, since that equation is satisfied by V(a,n) = 1. 

3 Upper bound on V(a, n) 

We let V(a, n) be the expected payoff from position (a, n) under infinite 
clairvoyance, that is, assuming we have complete knowledge of the results 
of the future coin flips and stop when we reach the maximum proportion of 
heads. Obviously V(a,n) < V(a,n), so that any upper bound on V(a,n) is 
also an upper bound on V(a,n). 

Theorem 3.1. 

^' U) ^ ^ 0? I) + ^ (IS 2-|2a- J ■ (3) 

The first term of the right hand-side of (J3|) is equal to the lower bound 
02]), and thus the second term bounds the error in that approximation. The 
proof of Theorem 13.11 consists of Lemma 14.11 together with some calculations 
in the rest of Section HJ 

Let us already here describe how we have used ([3]) computationally. We 
have computed upper bounds on V(a, n) in a box stretching out to n < 
N = 10 7 , and with height given by \2a — n\ < h for a fixed h (thus the 
box includes points where a deviates from n/2 by at most h/2). At the 
positions on the "boundary" of the box (more precisely, where (a + 1, n + 1) 
or (a, n + 1) is outside the box), V(a,n) has been estimated by ([3]), whereas 
for the positions in the interior we have used ([1]), controlling the arithmetic 
so that all roundings go up, in order to achieve rigorous upper bounds. 
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The second term of the right hand-side of (jSJ) gives two different upper 
bounds on the error in ([2]), where the bound (1/4) • is better close 

to the line a = n/2, while 1/(2 \2a — n\) is the sharper one away from that 
line. It seemed natural to choose the height h of the box in such a way that 
these two bounds approximately coincide at the farther corners of the box, 
in other words so that 

i nr _ i 

4 V N ~ 2^7? 

that is, h ~ (2/y/n) • vJV. In our computations leading to the results of 
Table [1] (with N = 10 7 ), we have taken h = 3568. The second column 
of Table [1] lists positions for which we have determined that stopping is 
optimal. This includes 5 heads to 3 tails, a position discussed in [3] and for 
which computational evidence [3J E] strongly suggested that stopping should 
be optimal. To the best of our knowledge our computation provides the first 
rigorous verification of this fact. 



4 Proof of Theorem 3.1 



For a and n as before, and p € [0,1], let P(a,n,p) denote the probability 
that, starting from position (a, n), at some point now or in the future the 
total proportion of heads will strictly exceed p. In other words P(a,n,p) is 
the probability of success starting from (a, n) if instead of trying to maximize 
expected payoff, we try to achieve a proportion of heads exceeding p, and 
continue as long as this has not been achieved. When p is rational, P(a, n,p) 
is algebraic and can in principle be calculated with the method of [5J, but we 
need an inequality that can be analyzed averaging over p. 

Lemma 4.1. Suppose that in position (a,n), the nonnegative integer k is 
such that at least k more coin flips will be required in order to obtain a 
proportion of heads exceeding p. Then 

P(a,n,p) < ^—r. (4) 

Proof. We can assume that p > max(a/n, 1/2), since otherwise the statement 
is trivial. From position (a, n) condition on the event that the total propor- 
tion of heads will at some later point exceed p. Then, by the law of large 
numbers, there must be a maximal m such that after a total of m coin flips 
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the proportion of heads exceeds p. Conditioning further on m, the number of 
heads in coin flips number n + 1, . . . , m is determined, and all permutations 
of the outcomes of these m — n coin flips are equally likely. The proportion 
of heads among these coin flips is at least p, so the (conditional) probability 
that coin flip n + 1 results in heads is at least p. If k > 1, then if coin flip 
n+l was heads, the proportion of heads in flips n + 2, . . . , m is still at least 
p, so the probability of heads-heads in flips n + 1 and n + 2 is at least p 2 etc. 
Therefore the (conditional) probability that flips n + 1, . . . , n + k all result 
in heads is at least p k , and since this holds for every m, we don't have to 
condition on a specific m, but only on the event that the proportion of heads 
will exceed p at some point. 

Since the unconditional probability of k consecutive heads is l/2 fc , the 
statement now follows from a simple calculation: On one hand, 

Pr(k consecutive heads | proportion p is eventually exceeded) > p k . 

On the other hand, 

Pr(k consecutive heads | proportion p is eventually exceeded) 

Pr(k consecutive heads) (l/2) fc 



< 



Pr (proportion p eventually exceeded) P(a, n,p) 
Rearranging, we obtain □ 
Our next task is to use Lemma [4.11 to estimate V(a,n). We have 

V(a,n) = [ P(a, n,p) dp = max (— , -J + / P(a,n,p)dp. (5) 

If p > max(a/ra, 1/2), then the requirement that at least k more coin flips 
are needed to obtain a proportion of heads exceeding p is equivalent to 

a + k — 1 

< P, 



which we rearrange as 



n + k — 1 

np — a 
k<l+ ' 



1 — p 

Since there is always an integer k in the interval 

np — a , np — a 

— — < k < 1 + — . 

1 — p 1 — p 
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we conclude using Lemma H~T1 that for p in the range max(a/n, 1/2) < p < 1 
of integration in §5§ , 

P{a,n,p) < — . 

(2p)~ 

It follows that 

a 1\ f 1 dp 



V(a,n) < max (-,-)+ / 
\n 2J J n 



i>!) (2p) ^ 

By the substitution 2p = 1 + 1 and the elementary inequality 

log(l + t) 

we obtain 

a 1\ 1 Z" 1 tft 



V(a,n) < max ( -, - J + - / 



(l + t)n-2a 

max(^,o) (1 + t) i-t 

max ( -, ^ ) + 7T / ex P ( ~ + -, ~ 2a ■ M 1 + *) ) 



n'2/ 2j ma ^(2a-n n \ \ 1 - 1 





f 








' 2a-n 




max( 


v n 






1 


f- 


•k) 




V n 




+ 2 



< max f-, + - f exp (-(1 + t)tn + 2<rf) dt. (6) 

\n 2 J 2 J max ^2a_n j0 ) 

By putting u = ty/n and replacing the upper bound of integration by infinity, 
we arrive at 

~ j . /a 1\ If 00 ( 2 2a -n \ , . . 

1/ (a, n) < max -, - + — P / exp -w H =- ■ u du. (7) 

Vn 2/ 2y/n Jj^^t^ \ V n J 

Now notice that by the substitution w = u — (2a — n)/ ^/n, 

( 2 2a -n \ , Z" 00 / 2 2a \ 

exp — m H =— ■ u du = / exp — w -=— ■ w dw. (8) 

2a-« V x/n / ./n \ Jn / 



Therefore regardless of the sign of 2a — n, (jTJ) can be written as 
a 1\ If 30 / „ \2a-n 



Via. n) < max ( — , - ) H ^= / exp I — u 2 — — '"' • u ) du. (9) 

n 2 J 2yfn J \ ^Jn 
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The bound ([H]) can be used directly in computations by first tabulating 
values of the integral, but we have chosen to simplify the error term further 
(instead spending computer resources on pushing the horizon). We can dis- 
card either of the two terms inside the exponential in fl9]). On one hand, the 
error term is at most 



-"7= / exp (-u 2 ) du = ~\ -- 



2y/n 

On the other hand, it is also bounded by 

1 f 00 / \2a-n\ 

exp — — • u J du 



2y/n Jq \ y/n J 2 ■ \2a — n\ 

This completes the proof of Theorem 13.11 

In the latter case, \2a — n\ is the absolute difference between the number 
of heads and the number of tails. The simplicity of the inequality V(a, n) < 
max(a/n, 1/2) + 1/(2 \2a — n\) suggests that there might be a proof involving 
considerably less calculation. 

Theorem 13.11 allows us to calculate V(a,n) to any desired precision. This 
is because ([1]) has the property that if V(a,n + 1) and V(a + l,n + 1) are 
both known with an error of at most e, then the same is true of V(a,n). 
To obtain the desired level of precision, we therefore only need to start our 
calculation from a horizon where the error term in ([3]) is sufficiently small. 

On the other hand it is difficult to say in advance how far we have to take 
our computations in order to find the optimal decision in a given position, 
as the expected payoff on continuing may be very close to the payoff a/non 
stopping. For instance, we have no idea how hard it will be to find the opti- 
mal decision in the position 116-104 (the first one whose status we haven't 
determined). For all we know the question whether stopping is optimal in 
this position might be undecidable by our method, although this would re- 
quire the expected payoff on continuing to miraculously be exactly equal to 
the payoff on stopping. 
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5 Appendix: Computational results 



We have computed upper and lower bounds on V(a, n) for (a, n) satisfying 
n < 10 7 and \a — n/2| < 1784 fa a/10 7 /7t. These results allow us to find the 
optimal decision in most positions early in the game. It is better to continue 
precisely when V(a, n) > a/n, while stopping is optimal when V(a, n) = a/n. 

We have included the results relevant to a total of at most 1000 coin 
flips, and in this range we have determined optimal play for all except seven 
positions. 

If the number a of heads is not greater than n/2, continuing is always 
better than stopping. If a > n/2, then to read Table (TJ consider the difference 
2a — n = a — (n — a) of the number of heads to the number of tails. It turns 
out (as is easily shown by a coupling argument) that for a fixed difference, 
the optimal decision will be to stop if n is below a certain threshold, and to 
continue if n is above that threshold. 

If for instance we have 19 heads against 14 tails, the difference is 5. 
According to the table, stopping is best even up to 23-18, so we stop. As 
can be seen in the table, the opening theory is complete up to difference 11, 
while for difference 12 the status of the position 116-104 is still unknown. 

For the position 16-12, the decision is extremely close, and a run with 
iV = 10 6 fails to determine the optimal decision, giving an upper bound 
of 0.5714326 on continuing compared to the payoff 16/28 ~ 0.57142857 on 
stopping. A run with N = 10 7 shows that the expected payoff on continuing 
is between 0.5714192 and 0.5714278, revealing that stopping is optimal. 

For V^(0,0), Julian Wiseman gives the lower bound 0.7929534812 based 
on a calculation [5] much more extensive than ours (with a horizon of iV = 
2 28 « 268,000,000) and suggests 0.79295350640 as an approximation of the 
true value. Our bounds obtained with N = 10 7 are 

0.79295301268091 < V(0, 0) < 0.79295559864361. 
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difference stop with but go with 



1 


1- 


-0 


2- 


-1 


2 


5- 


-3 


6- 


-4 


3 


9- 


-6 


10 


-7 


4 


16- 


-12 


17- 


-13 


5 


23- 


-18 


24- 


-19 


6 


32- 


-26 


33- 


-27 


7 


42- 


-35 


43- 


-36 


8 


54- 


-46 


55- 


-47 


9 


67- 


-58 


68- 


-59 


10 


82- 


-72 


83- 


-73 


11 


98- 


-87 


99- 


-88 


12 


115- 


-103 


117- 


-105 


13 


134- 


-121 


135- 


-122 


14 


155 


-141 


156- 


-142 


15 


176- 


-161 


177- 


-162 


16 


199- 


-183 


201- 


-185 


17 


224- 


-207 


225- 


-208 


18 


250- 


-232 


251- 


-233 


19 


277- 


-258 


279- 


-260 


20 


306- 


-286 


307- 


-287 


21 


336- 


-315 


338- 


-317 


22 


368- 


-346 


369- 


-347 


23 


401- 


-378 


402- 


-379 


24 


435- 


-411 


437- 


-413 


25 


471- 


-446 


473- 


-448 


26 


508- 


-482 


510- 


-484 


> 27 


stop 







Table 1: Opening theory for the first 1000 steps of the Chow-Robbins game. 
If the difference (number of heads — number of tails) is non-positive, we 
always continue. If the difference is 27 or more and the total number of flips 
is at most 1000, stopping is optimal. For differences from 1 to 26, stopping 
is optimal up to and including the position in column 2, while continuing is 
optimal from the position in column 3 and on. There are seven positions in 
this range for which we have not determined the optimal decision: 116-104, 
200-184, 278-259, 337-316, 436-412, 472-447 and 509-483. 
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